BEST AVAILABLE COPY 

JEuropaisches Patentamt 
European Patent Office d w . . ~ _ 

(!2)Publ,cat,on number: O 189 943 

Office europeen des brevets 

A2 



EUROPEAN PATENT APPLICATION 



©Application number: 86101338.1 © ln,.C.* : G 06 F 15/68 

© Date of filing: 31.01.86 



CM 
< 

3 

0) 

o 

00 



(») Priority: 01.02.85 JP 165S3/8S 
27-09.85 JP 214163/85 

20.12.85 JP 285576/85 

Date of publication of application: 

06.08.86 Bulletin 86/32 

@ Designated Contracting States: 
CH DE FR GB IT U NL SB 

© Applicant: HITACHI, LTD. 

6, Kanda Surugadai 4-chome Chryoda-ku 
Tokyo 1CKHJP) 

© Inventor: Miura, Shuuichf 

Yuhoryo 305, 20-3 Ayulawacho-^chome 
HfUchi-»hi(JP) 

©Inventor: Kobayashi, Yoshlki 
24-5 # Mfkanoharacho-2-chorns 
Hftachl-shl(JP) 



Parade! Image processor. 

SiLf^nH 5 !..? «" e ' '"'"fS'.P"****" ^ which line buffers 
2M and data-flow switching circuits (70> each requirinq • 

EXE?? 1 °! h K rdW8fe i0 *■ ™ « >™Z'*eZ 
S^»Jf^- ' m8ge d8ta de,8yed * thc "-^buffers 

Siets 5 Tn J!Th Bn ,mafle d8ta ou,pirt ^ « 55 »- shi * 

reg^rs (31-,) each having a variable number of steps for 
P^rvmg local .mage regions are intermittently »hifted-in 

ZmZSZZ ^ h 4 aPf>,ied C,0cks ' and »"«•".» of the 
shift registers (31 -i) are sequentially read out. 



© Inventor: Fukushima, Tadashi 
23-5, Hafiayamacho-1-chome 
HrtacM-shKJP) 

© Inventor: OJuiyama, Yoshiyuki 

© Inventor: Katoh, Takeshi 

5-19, Higashionumacho^^Mffne 
HHacM-shSfJP) 

© Inventor: Hirasaw*, Kotaro 
10-7, feitesawacrK>-7-chome 
Hhachr-shi(JP) 

© Inventor: Aseda, Kazuyoshi 
15-8-1, Suwacho^cWw 
Hftachi-shi(JP) 

© Re^ntative: Strehl, SchubeWloDf. Groaning. Schul* 
Widenmeyerstrasse 17 Postfach 22 03 45 
D-8000 Munchen 22{D£) 



III 



Croydon Priming Company Ud. 



BEST AVAILABLE COPY 

•/ • • • 



£OCID. <EP 0189943A2J_> 



PARALLEL IMAGE PROCESSOR 



018994 



1 BACKGROUND OF THE INVENTION 

This invention relates to a processor for 
parallel image processing which performs local neighboring 
(Kernel) image processings such as spacial convolution 

5 operation. 

The image processing for processing image data 
is classified into a preprocessing, a feature extraction 
processing, judgement processing, etc., and the parallel 
image processing processor according to this invention 

10 is suitable to mainly perform the preprocessing. 

This preprocessing is desired to be performed 
by an image processor which is versatile and allows a 
high speed processing. However, since the image data 
to be processed are two-dimentionally extended, it is 

15 difficult to parallely process all the image data. 

Therefore, the parallel processing is often performed for 
the operations among local neighboring image data such as 
spacial convolution operation which is intended for noise 
reduction and edge enhancement. In order to process such 

20 local neighboring image data, there has been proposed an 
LSI circuit of a local parallel type image processor 
which is disclosed ±n Japanese Patent Unexamined Publica- 
tion No. 59-146,366 (corresponding to U.S. Application 
Serial No. 578,508) and U.S.P. No. 4,550,437. This 

25 circuit was large-scale integrated using as a main module 



<EP 0189943A2_I_> 



01*9943 

^ * • # • 

1 a parallel operation circuit which operates parts of the 
local neighboring data in parallel; plural main modules 
are arranged or one main module is subjected to a time 
division processing to extend the size of the local image 

5 region/ thereby performing the parallel processing of 

local neighboring operations at a high speed and versati- 
lely. 

Namely, this processer performs an mxn (m, n: 
integer) local parallel image processing in such a way 

10 as that (1) m main modules each having a arithmetic units 
(processor elements, PE's) are arranged and perform the 
process in one machine cycle or (2) a single main module 
having n PE's are used in a time division manner and 
performs the processing in m machine cycles, 

15 Where in the above prior art, plural main modules 

are used to perform an image processing, line buffer 
circuits, are employed/ as externally equipped circuits, 
for supplying in parallel the image data to the respective 
main modules. Therefore, once the wiring is made, the 

20 local image region which permits a parallel processing 
is disadvantageously fixed. Moreover, additional line 
buffer circuits must be employed for expanding the local 
neighboring region. For example, where a 3 x 3 local 
parallel operation is performed with an operating frequency 

25 of 6 MHz for an image of 256 x 256 pixels with each pixel 
data indicated by 8 bits, a 4 K bit high speed memory or 
shift register operating with a frequency of 6 MHz is 
required so that the required amount of hardware becomes 
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On the other hand, where the time division 
processing is carried out for the image processing, the ' 
above line buffer circuit is not required. However, the 

5 image data must be supplied to the main module by means of 
a particular scanning method of a stick scanning. In 
order to convert the ordinary raster-scanned image data 
into the stick-scanned one, a larger amount of hardware 
is required than the above line buffer circuit. 

10 SUMMARY OF THE INVENTION 

An object of this invention is to provide a 
parallel image procsssor which is capable of obviating 
the above disadvantages of the prior art and of easily 
expanding the local image region to be subjected to a 

15 * local neighboring operation with a smaller amount of 
hardware . 

Another objedt of this invention is to provide 
a parallel image processor which can be flexibly applied 
to several local image regions by means of the same 

20 hardware construction. 

These objects can be attained by an LSI'ed 
parallel image processor in which line buffers and data- 
flow switching circuits requiring a larger amount of hard- 
ware in the prior art are incorporated into an LSI circuit, 

25 the image data delayed by the line buffers is output 

from an image data output port, shift registers each having 
a variable number of steps for preserving local image 
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1 regions are intermittently shifted-in in accordance with 

applied clocks and the contents of the shift registers 

are sequentially read out. 

Namely, in accordance with this invention, the 

5 amount of hardware can be reduced since the line buffer % • ; 

♦ 

circuits are incorporated in LSI and the delayed image 

c 

data is output, and the size of the local image region r <* 
can be easily expanded only by the connection of LSI's. 
Further, the data-flow switching circuits are also in- 

10 corporated as peripheral circuits to operate the step- 
number variable shift registers xn a time division manner 
so that the parallel image processor according to this 
invention can be freely adapted to various local image 
regions without altering the external wirings. 

15 The above and other objects and features of 

this invention will be apparent from the following descrip- 
tion taken in conjunction with the accompanying drawings, 
in which like reference characters refer to like elements 
in the several views. 

20 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram showing one arrange- 
ment of the main module used in the parallel image processor 
according to one embodiment of this invention; 

Fig. 2 is a view for explaining a local 
25 parallel operation system; 

Fig. 3 is a brock diagram of a parallel operation 
section inside the main module; 
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1 Fig. 4 is a block diagram showing a unifying 

circuit inside the main module; 

Fig. 5 is a block diagram for explaining the 
examples of the operation of the unifying circuit; 
5 Fig. 6 is a block diagram showing the arrange- 

ment of a line buffer inside the main module; 

Fig. 7 is a block diagram showing one arrange- 
ment of a step-number-variable shift register inside the 
main module; 

10 Fi 9* 8 is a circuit diagram of each of the cells 

in the step-number-variable shift register; 

Fig. 9 is a view for explaining the operation ' 
of the variable step shift register of Fig. 7; 

Fig. 10 is a timing chart of the step-number- 
15 .variable shift register of Fig. 7; 

Fig. 11 is a block diagram showing another 
arrangement of a step-number-variable shift register inside 
the main module- 
Fig. 12 is a timing chart of the step-number- 
20 variable shift register of Fig. 11; 

Fig. 13 is a block diagram showing still another 
arrangement of a step-number-variable shift register 
inside the main module; 

Pig. 14 is a view for explaining the operation 
25 of the step-number-varaible shift register of Fig. 13; 

Fig. 15 is a timing chart of the step-number- 
variable shift register of Fig. 13; 

Figs. 16 to 18 are block diagrams showing 
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1 examples of the application of the main module, respec- 
tively; 

Fig. 19 is a block diagram showing an arrangement 
of the main module used in the parallel image processor 
5 according to another embodiment of this invention; 

Figs. 20 to 22 are block diagrams showing 
examples of the application of the main module, respec- 
tively; 

Fig. 23 is a block diagram showing an arrange- 
10 ment of the main module used in the parallel image processor 
according to still another embodiment of this invention; 
and 

Figs. 24 to 27 are block diagrams showing 
examples of the application of the main module, respec- 
15 tively. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Several embodiments of this invention will be 
explained hereinafter referring to the drawings. 

Fig. 2 shows a local parallel operation system 
20 for performing a 3x3 (m x n; m, n: integer) local 

neighboring image processing, which is a main operation 
of the image preprocessing operation, at a high speed. 
It is assumed that an input image 1 to be processed is a 
gray-scale image consisting of 10 x 10 image data, and 
25 the image is raster- scanned in the order of ^j) , vj) * ® 

as shown in Fig. 2. Fig. 2 shows the state when the 

raster-scanning has been finished till the image data 133) . 
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1 The image data raster-scanned from input image 

1 are fed to a register 31-00 and a line buffer 20-0. 
The image data fed to register 31-00 are shifted to 
registers 31-01 and 31-02 in order. The image data 
5 fed to line buffer 20-0 are delayed by the time required 
to scan one line of the image data and fetched therefrom. 

The image data fetched from line buffer 20-0 are 
fed to a restier 31-10 and a line buffer 20-1. The image 
data fed to register 31-10 are shifted to registers 31-11 
10 and 31-12. The image data fetched to line buffer 20-1 
are delayed by the time required to scan one line of the 

i 
t 

image data and fetched therefrom. ' 
The image data fetched from line buffer 20-1 
- are fed to a register 31-20. The image data fed to 
15 register 31-20 are shifted to registers 31-21 and 31-22 
in order. 

Thus, when the image data © is register 31-00 
and line buffer 20-0, 3x3 local neighboring image data 
© , @ , © , @ , © , © , @ , @ and @ 
20 with the image data © centered are simultaneously 

stored in nine registers 31, respectively. Therefore, by 
employing the same number of arithmetic units as that of 
registers 31, the image data in the respective registers 
31 can be parallely operated so that the high speed 
25 processing thereof can be realized. 

Fig. 1 shows an arrangement of the main module 
10 of the parallel image processor according to one embodi- 
ment of this invention which is capable of implementing 
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1 the above local parallel operation system. Main module 10 
comprises an image data input port 54 from which the 
image data are input, an image data output port 55 from 
which the image data delayed inside main module 10 are 
5 output, an operation data input port 64 from which the 
operation result from another main module 1 is input, and 
an operation result output port 65 from which the internal 
processing result is output. 

The image data raster- scanned from input image 
10 1 are fed to a step-number-variable shift register (VSR) 
31-0 , line buffer 20-0, and a selector 70 through image 
data input port 54. Line buffer 20-0 delays the input 
image data by the time required to scan one line of the 
- image data, and delivers- the delayed image data to a 
15 selector 33-0, line buffer 20-1, and selector 70. 

Line buffer 20-1 delays the image data fed from line 
buffer 20-0 by the time required to scan further one line 
of the image data and deliveres the delayed image data 
to selectors 33-1 and 70. 
20 Selector 70 selects one of the image data from 

image data input port 54, the ouput from image data line 
buffer 20-0 and the ouput from line buffer 20-1 in accord- 
ance with a control signal from a control circuit 21, and 
outputs it from image data output port 55. Namely, one 
25 of the image data delayed from the input image data by 
0, 1 and 2 lines of the data is selected by selector 70 
and output from image data output port 55 (Incidentally, 
the output from image data output port 55 is an input 
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1 image data of a next main module 10 when plural main 
modules are employed) . 

VSR 31-0 carries out a shifting operation in 
accordance with a control signal from control 21 and 
5 delivers the image data to a parallel operating section 
30 and selector 33-0. 

Selector 33-0 selects either one of the output 
from line buffer 20-0 and the output from VSR 31-0 in 
accordance with a control signal from control circuit 21 

10 and delivers them to VSR 31-1 , VSR-31-1 carries out a 
shifting operation in the same manner as in VSR 31-0 
and supplies the image data to parallel operating section 
30 and selector 33-1 . 

Selector 33-1 selects either one of the output 

15 from line buffer 20-1 and the output from VSR 31-1 in 

the same manner of control as in selector 33-0 and supplies 
them to VSR 31-2. VSR 31-2 carries out a shifting opera- 
tion in the same manner as in VSR 31-0 and supplies the 
image data to parallel operating section 30. Thus, 

20 VSR's 31 can be arranged in one of two manners of 1 x 3 
and 3x1 by switching operation of selectors 33. The 
arrangement of VSR's 31 corresponds to that of the local 
image data which can be simultaneously operated during 
one machine cycle . 

25 Parallel operating section 30 parallely operates 

the image data from VSR's 31-0, 31-1 and 31-2 and delivers 
the result of operation to a unifying circuit 40. 
Unifying circuit 40 unifies the operation data supplied 
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1 from operation data input port 64 and the output from 
parallel operating section 30. The unified data is 
fetched from operation data output port 65 and stored in 
an output image 2. 
5 The main module 10 in accordance with this embodi- 

ment permits three image data simultaneously supplied from 
three VSR's 31 to be processed in parallel in parallel 
operating section 30. 

On the other hand, the most general local 

10 neighboring image operation is an operation of processing 
3x3 local neighboring image data as shown in Fig. 2 
in which 9 (nine) image data are required to calculate 
one output image data. Such a 3 x 3 local neighboring 
image operation using the main module 10 can be realized 

15 by the following two systems of: 

(1) time division processing 

(2) provision of more main modules 

The system of (1) operates nine local neighbor- 
ing image data in such a way that three image data are 

20 assigned for each of three machine cycles, and unifies 
the operation results in unifying circuit 40 in three 
machine cycles. In this system, the input of the image 
data and the output of the operation results are performed 
once during three machine cycles. The main module 10 

25 according to this embodiment permits a time division 

processing of maximum eight machine cycles, and maximum 24 
image data can be processed in a time division manner 
using one main module 10. 
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1 In the case of an n-times time division processing 

line buffer 20 is once during n machine cycles, and VSR 31 
performs a shift operation once during n machine cycles 
preserves 1 x n local neighboring image data during n 
5 machine cycles. VSR 31 futher sends n image data to paral- 
lel operating section 30 one by one during the n machine 
cycles. Parallel operation section 30 performs the - 
arithmetic between the image data supplied in n times 
and the n coefficients data which are produced correspond- 

10 ing to the image data every one machine cycle and supplies 
the operation results (data) to unifying circuit 40 every 
one machine cycle. Unifying circuit 40 unifies the opera- 
tion data supplied in n times from parallel operation sec- 
tion 30 in n machine cycles and outputs the unified data 

15 from operation data output port 65. Thus, this system is 
slow in its processing speed but requires only one main 
module and a less amount of hardware. 

The system of (2) simultaneously operates the 
3x3 local neighboring image data during one machine 

20 cycle using three main modules 10. In this system, three 
image data are operated in each main module and the 
operation data are unified through these three main 
modules. This system requires a more amount of hardware 
than in the system of (1) but can performs the operations 

25 at a high speed. 

The main module 10 according to this embodi- 
ment is also adapted to a mult i -mask processing. 'The 
multi-mask processing with the number of masks 
set at in is a processing of performing m local 
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1 neighboring image operations for one input image 1 and 
unifying jn output images 2 thus obtained to provide a 
final result. This multi-mask processing is used for 
an edge enhancement processing , etc. The main module 10 
5 in accordance with this invention permits the processings , 
prior to the unification in the multi-mask processing to '** 
be performed by one image scanning. In the case of the * 
multi-mask processing with the number of masks set at m, 5 
the image data is taken in once in m machine cycles , * 

10 and line buffer 20 and VSR 31 also operate once in m ; 
machine cycles. VSR 31 continues to supply the same 
image data to parallel operation unit 30 in m machine ! 
cycles. Parallel processing unit 30 produces m coefficient 
patterns for one image data during m machine cycles 

15 and performs the arithmetic thereof with the image data 
every one machine cycle. M operation results are 
sequentially output from operation data output port 65 
during ra machine cycles. Further, this multi-mask proces- 
sing can be combined with the time division processing 

20 as mentioned above. In the case of the time-division multi- 
mask processing with the numbers of time-divisions and 
masks being set at t. and m, respectively , the image 
data is taken in once in t x m machine cycles and . 
jn operation results are sequentially output every t 

25 machine cycles. 

The above time-division multi-mask processing 
can be realized by externally operating control circuit al 
to set a control signal MSKTMS from control circuit 21 
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1 giving (mask number x time division number -1) and 

another control signal TMS from control circuit 21 giving 
(time division number -1) . 

Fig. 3 illustrates a detailed arrangement of 
5 parallel operating section 30. In this figure, output 
signal lines 300, 301 and 302 from VSR's 31-0, 31-1, and 
31-2 are connected to one input's of three processor 
elements (PE's) 37-0, 37-1 and 37-2, respectively. The 
other inputs thereof are connected with three coefficient 
10 memories 36-0, 36-1 and 36-2 which supply the previsouly 
stored coefficient data to the corresponding processor 
elements 37 in accordance with address outputs from a 
counter 35. The outputs from operation circuits are 
unified by an arithmetic element 38 and the unified data 
15 are fed to unifying circuit 40 through a signal line 400. 

In the case of MSKTMS1014 #0, the time division 
processing or multi-mask processing is realized, and 
coefficient memories 36 read out the coefficient data at 
addresses which are supplied from a counter 35 and changed 
20 every one machine cycle and supply them to processor 
elements 37. 

Fig. 4 illustrates a detailed arrangement of 
unifying circuit 40. The output from parallel operating 
section 30 is fed to a register 41 and a selector 42 
25 through signal line 400. . The. output from register 41 

is fed to selector 43. Selector 42 selects the operation 
data supplied from operation data input port 64 through a 
signal line 640 and the output from parallel operating 
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1 section 30 and supplies them to an arithmetic unit 44. 
A selector 43 selects an output line 410 from register 41 
and an output line 650 from unifying circuit 40 and supplies 
them to arithemtic unit 44. The output from arithmetic 
5 unit 44 is fetched to the external from operation data 

output port 65 through signal line 650. 1 

Selectors 42 and 43 are controlled by control ' 
signals 420 and 430 from a counter 46 , respectivly. Counter' * 
46 is controlled by a reset signal 450 and a control 
10 signal TMS1013 providing (time division number -1) which 
are supplied from control circuit 21 in such a manner 
that it is reset when the reset signal is "HIGH" and 
repeats the count-up from 0 to TMS . With TMS = 0, selectors 
42 and 43 always select signal lines 640 and 410, respec- 
1-5 tively. With TMS ^ 0, selector 42 selects signal line 640 
only when the value of counter 46 becomes equal to TMS, 
and selector 43 selects signal line 410 only when the 
value of counter 46 becomes zero. 

Fig. 5 shows the operation of unifying circuit 
20 40 when TMS = 2. Unifying circuit 40 unifies, during 

(TMS + 1) machine cycles, (TMS +1) operation data supplied 
during the cycles and one operation data supplied from data 
line 640. 

In the case as shown in Fig. 5, operation data 
25 a, b and c on data line 400 and an operation data T on data 
line 640 are unified by the addition thereof. During 
a first machine cycle, the operation data a and b are 
added. During a second machine cycle , a + b and c are 
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1 added to provide a + b + c. During a third machine cycle, 
a + b + c and J are added. And during the subsequent 
machine cycle, the unifying result a + b + c + T are 
fetched from register 45. 
5 • Fig. 6 illustrates a detailed arrangement of 

two line buffers 20-0 and 20-1 of Fig. 1 which is const- 
ructed by RAM's. 

The arrangement as shown in Fig. 6 is adapted to 
permit the number of delaying steps to be altered, i.e. 

10 to form two line buffers which can delay 8 bit data by 
10 24 steps at its maximum or one line buffer which can 
delay 8 bit data by 2048 steps at its maximum. 

In Fig. 6, RAM's 241 and 242 have a storage 
capacity of 8 x 1024 bits, respectively. When a clock 

15 * signal 2102 is on its high level (hereinafter simply 

referred to as "High"), the 8 bit data of RAM's 241 and 242 
which correspond to the output of a row address control 
circuit 245, 10 bit row address signal 2103, are read out 
on signal lines 252 and 253, respectively. When clock 

20 signal 2102 is on its low level (hereinafter simply 
referred to as "Low") and an output data 2104 from an 
input/output information control circuit 246 is "Low", 
the 8 bit data on input signal line 540 is stored at the 
address of RAM 241 corresponding to row address signal 

25 2103. On the other hand, when clock signal' 2102 is 

"Low" and output data 2104 from input/output information 
control circuit 246 is "High", the 8 bit data on input 
signal line 540 is stored at the address of RAM 242 
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1 corresponding to row address signal 2103. The respective 
8 bit data on signal lines 252 and 253, read out from 
RAM's 241 and 242 are fed to selectros 243 and 244, 
respectively . 

5 Selector 243 selects the data on signal line ; 

252 when signal line 2104 is "Low" and selects the data 
on signal line 253 when signal line 2104 is "High", and fa 
delivers them to an output signal line 200. On the 
other hand, selector 244 selects the data on signal line 

10 253 when signal line 2104 is "Low" and selects the data on 
signal line 252 when signal line 2104 is "High", and 
delivers them to an output signal line 201. • 

Row address control circuit 245 is a 10. bit 
binary counter which is counted up each time control signal 

15 2101 becomes "Low" and clock signal 2102 becomes "High", and 
is initialized to zero when control signal 2101 becomes 
"High". Row address control circuit 245 delivers the 
counted data to a logic circuit 247 as well as RAM's 241 
and 242 as 10 bit row address signal 2103. Logic circuit 

20 247 delivers a "High" level output to a signal line 2106 
when all the 10 bit row address signals are "High" or 
when signal line 2101 is "High". In any other cases, 
logic circuit 247 delivers a "Low" level output. 

Input/output information control circuit 246 

25 is a one bit counter (i.e. T flip-flop) which changes the 
status of signal line 2104 from "High" to "Low" or from 
"Low" to "High" each time an initialization signal 2105 
becomes "Low" and signal line 2106 becomes "High", When 
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1 initialization signal 2105 is "High", signal line 2104 is 
initialized to "Low". 

The circuit of Fig. 6 operates as follows.. 
It is assumed that as an initial state, control 
5 signal 2101, clock signal 2102 and initialization signal 
2105 are all "Low". Now, after initialization signal 2105 
is changed to "High" and "Low", control signal 2101 is 
made "High". Then, the output signal 2103 from row address 
control circuit 245 is zero and the output signal 2104 

10 from input/output information control circuit 246 is "Low". 
Thereafter, control signal 2101 is changed to "Low", and 
clock signal 2102 is changed from "Low" to "High" and 
further to "Low". At this time, while clock signal 2102 
A-s "High", the content 8 bits at the 0-th address of RAM 

15 241 is read out onto output signal line 200 through signal 
line 252 and selector 2 43, and the content 8 bits at the 
0-th address of RAM 24 2 is fed onto output signal line 
201 through signal line 253 and selector 244. when clock 
signal 2102 becomes "Low", the 8 bit data on input signal 

20 line 540 is stored or written at the 0-th address of 
RAM 241. Then, the contents of RAM 242 don't vary at 
any row address. 

Thereafter, each time clock signal 2102 is 
changed. from "Low" to, "High" and further to "Low" , the 

25 row address of read-out and write-in is increased 

one by one, but in the same manner as mentioned above, 
the data read out from RAM 241 is fed to output signal 
line 200, the data read out from RAM 242 is fed to output 
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1 signal line 201 and the 8 bit data on input signal line 
540 is stored at the address of RAM 241 corresponding to 
the present row address signal. 

It is now assumed that control signal line 2101 
5 has become "High" before row address signal line 2103 
reaches 1023. Then, signal line 2106 is changed from 
"Low" to "High". The level change of signal line 2106 
changes the state of input /output information control 
circuit 246, making signal line 2104 "High". Thus, the 

10 selection states in selectors 24 3 and 244 are switched so 
that signal line 252 is connected with output signal line 
201 and signal line 253 is connected with output signal 
line 200. A writable RAM is shifted from RAM 241 to RAM 
242 so that RAM 241 is not writable. The output signal 

15 (row address signal line) 2103 from row address control 
circuit 245 is initialized to zero. 

Thereafter, if after control signal 2101 is 
made "High", clock signal 2101 is pulsed, the row address 
signal 2103 is increased from zero one by one. When clock 

20 signal 2102 is "High", in accordance with the present 

row address signal, the data read out from RAM 241 is fed 
to output signal line 201 through signal line 252 and 
selector 244 while the data read out from RAM 242 is 
fed to output signal line 200 through signal line 253 

25 and selector 243. When clock signal 2102 is "Low", the 
data on input signal line 540 is stored at the address 
of RAM 242 corresponding to the present row address 
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1 signal 2103. 

The relation between the arrangement of Fig. 6 
and the main module of Fig. 1 will be explained below. 

It is now assumed that the contents of RAM's 241 
5 and 242 in Fig. 6 are undefined as their initial state, 
and the number of the pixels of input image 1 is 100 in 
its horizontal direction. 

In Fig. 6, the image data of input image 1 are 
input from input signal line 540 and first written into 
10 RAM 241. Namely, 100 image data belonging to the first 
raster are sequentially written at the row addresses 0 to 
99 of RAM 241. Then, the undefind data are read out from 
RAM's 241 and 242. Next, 100 image data (pixel data) 
belonging to the second raster written at the row addresses 
15 0 to 99 of RAM 242. Then, the first raster image data are 
read out from RAM 241 while the undefined data are read 
out from RAM 242. 

100 image data belonging to the third raster are 
written at the row addresses 0 to 99 of RAM 241. Then, the 
20 first raster image data are read out from RAM 241 to out- 
put signal line 200 through signal line 252 and selector 
243 while the second raster image data are read out from 
RAM 242 to output signal line through signal line 253 and 
selector 244. Moreover, 100 image data belonging to the 
25 fourth raster are written at the row addresses 0 to 99 of 
RAM 242. Then, the second raster image data are read out 
from RAM 242 to output signal line 200 through signal 
line 253 and selector 243 while the third raster image 
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1 data are read out from RAM 241 to output signal line 201 
through signal line 252 and selector 244. 

Namely, when the third raster image data are 
input, RAM's 241 and 242 output the data as line buffers 
5 20-1 and 20-0, respectively. On the other hand, when the 
fourth raster image data are input, RAM's 241 and 242 

« 

output the data as line buffers 20-0 and 20-1, respectively. 

Generally, the odd-number-th raster image data 
are written in RAM 241 whereas the even- number- th raster \ 

10 image data are written in RAM 242. The raster image data 
read out frfom RAM's 241 and 242 are fed to the output 
signal lines 200 and 201 in such a way that the smaller- | 
number-th raster image data are fed to output signal line 
. 200 whereas the larger- number-th raster image data are 

15 fed to output signal line 201. 

When the number of delayed steps exceeds 1024 , 
i.e., the row address number reaches 1023, signal line 
2106 becomes "High" so that the output signal 2104 from 
input/output information control circuit 246 is changed 

20 in its state. Thus, the writing into RAM so far written 
is ceased and the writing into the other RAM is instructed 
(this writing is started from the 0-th address thereof) . 
Also, when the signal 2104 is changed in its state the 
connection states between RAM's 241 and 242 and output 

25 signal lines 200 and 201 are switched. Accordingly, the 
arrangement shown in Fig. 6 can be used as a 8-bit 2048 
step line buffer having input signal 540 and output 
signal 200. 
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1 The line buffers as described above were con- 

structed by RAM's which are suitable for LSI , but it is 
needless to say that they can be also constructed by 
shift registers. 

5 Fig . 7 illustrates one detailed arrangement of 

VSR 31-0. 

VSR 31-0 consists of a read-out signal control 
section 18 for performing a shifting operation 18, an 
output selection control section 19 and variable-step- 

10 number shift register cells (vsr) 100. The image data 
raster-scanned from input image 1 are input to vsr's 100 
from input data line 540 as 8 bit data. The output from 
vsr's 100 is fed to parallel operating section 30 and 
selector 33-0. Each vsr 100 performs the input and 

15 shift of the data by the read-out and write-in of the 

data during one machine cycle. In VSR 31-0 shown in Fig. 6, 
each vsr 100 performs the write-in and read-out of the data 
in accordance with a write enable signal <^ 1001 in 
synchronism with a clock and a read enable signal & ' 1006 

20 supplied from read-out signal control section 18. The 

output selection signal 1015 supplied from output selection 
control section 19 is fed to a clock gate 1500 (Fig. 8) 
constituting a selector, which is embedded in vsr 100. 
When the data in the vsr 100 in which output selection 

25 signal 1015 becomes "High- is fed to output data line 300 
as an output from selector. 

Read-out signal control section 18 in Fig. 7 
takes in (inputs) a read enable signal 1002 in synchronism 
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with a clock and outputs a read enable signal 1006 which 
intermittently becomes "High" . 

The read-out signal control section for performing 
a shifting operation consists of a 4-bit down counter 104, 
a half register (HR) 102 and a delay circuit 101. 4-bit 
down counter 104 is one which is counted down each 
clock. When a reset signal 1000 becomes "High" or a 
counter output 1004 becomes zero, a load signal 1024 
becomes "High", and during the subsequent machine cycle, 
4 bit data MSKTMS 1014 is loaded into the 4 bit down 
counter 104 from control circuit 21. HR 102 and delay 
circuit 101 geneate a read control signal 1005 having 
delayed load signal 1024 by a half machine cycle so that 
read enable signal 1006 can be "High" during a machine 
cycle subsequent to the machine cycle during which load 
signal 1024 has become "High". 

Output selection control section 19 consists of 
a 3-bit up counter 103 and a decoder 105 and switches 
output selection signal 1015 every one machine cycle. 
3-bit up-counter 103 is one which is counted up every 
clock. When reset signal 1000 becomes "High", or the 
counter output coincides with the 3 bit data TMS 1013 
supplied from control circuit 21, a reset signal 1023 
becomes "High", and during the subsequent machine cycle, 
the 3-bit up-counter 103 is reset. The output 1003 from 
3-bit up-counter 103 is decoded by decoder 105 and becomes 
output selection signal 1015. 

It should be noted that the step number of a 



shift register 31-0 can be altered by the TMS signal and 
is (TMS + 1) when a predetermined value is set at TMS. 

Fig. 8 shows the detail of vsr 100 which is 
1 bit one step shift register. The vsr 100 performs the 
data shifting by reading out the data in vsr 100 to an 
output line 1011 during the former half of one machine 
cycle and by writing the data from an input line 1010 into 
vsr 100. Input line 1010 is connected with input data 
line 540 at the first step vsr 100, and is connected with 
the output line 1011 of the previous step vsr 100 at the 
vsr's other than the first step vsr. The data in vsr 100 
is fed to output data line 300 when output selection 
signal 1015 is "High". 

Fig. 9 shows the operation of VSR 31-0 when 
MSKTMS = 5 and TMS = 2, and Fig. 10 is a timing chart 
thereof. VSR 31-0 inputs and shifts the data once in 
(MSKTMS + 1) machine cycles and sequentially outputs the 
data in VSR 31-0 during (TMS + 1) machine cycles. in the 
case of Fig. 9 the data is input and shifted once in 6 
machine cycles in VSR 30-0 and the data stored in 
VSR 31-0 are sequentially output during 3 machine cycles. 

Symbols ©, ...... .^g) shown in Figs. 9 and 10 

designate a first, a ninth machine cycle, respec- 

tively. The first machine cycle corresponds to the 
state where data A and B are stored in VSR 31-0 and data C 
has reached input data line 540. Then, when reset signal 
1000 is made "High", the 4-bit down counter and 3 -bit up 
counter are initialized, respectively. Also, since the 
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1 read-out control signal (RDEN) 1005 is "High" from the 
first machine cycle to the second machine cycle, read 
enable signal # 2 ' 1006 is "High" in the second machine 
cycle. Thus, from the first machine cycle to the second 
5 machine cycle, the data C is input in VSR 31-0 and the. 
data A and B are shifted rightwards by one step. 

During the second machine cycle to the seventh 
machine cycle, 3-bit up counter 103 continues to count 
like 0, 1, 2, 0, 1, 2, so that the data A, B and C stored 

10 in VSR 31-0 are output in the order of C, B, A, C, B, A. 

At the seventh machine cycle, the subsequent 
data D reaches input data line 540. Then, 4 bit down 
counter 104 output zero and the read-out control signal 
(RDEN) 1005 is high from the seventh machine cycle to 

15 the eighth machine cycle so that as in the first to 

second machine cycles, from the seventh machine cycle to 
the eighth machine cycle, a data D is input in VSR 31-0 
and futher, the data B and C are shifted rightwards by one 
step, and the data A is abandoned. Thereafter, during six 

20 machine cycles from the eighth machine cycle, the data B, 
C and D are preserved, and sequentially read out from 
VSR 31-0 in the order of D, C, B, D, C, B. 

According to one arrangement of VSR 31-0 as shown 
in Fig. 7, local neighboring (Kernel) image can be cut 

25 out from the raster scanned input image 1 intermittently 
supplied and preserved in the step-number-variable shift 
register (VSR) , And also, the preserved local neighboring 
image data can be supplied to an operation circuit in a 
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1 time division manner. 

Fig. 11 shows another arrangement of VSR 31-0.- 
In this arrangement VSR 31-0 consists of a write signal 
control section 28 for performing a shifting operation, 
5 an output selection control section 19 and step-number- 
variable shift register cells (vsr) 100. In this arrange- 
ment, each vsr 100 performs the write-in and read-out 
of the data in accordance with a write enable signal 
^1 * 1106 output from write signal control section 28 and 
10 the read enable signal <r> 2 1002 in synchronism with a 
clock, respectively. 

Write signal control section 28 corresponds to 
read signal control section 18 of Fig. 7 which performs a 
shifting operation, and takes in (inputs) write enable 

15* signal 1001 in synchronism with a clock and outputs 
write enable signal 1106 which intermittently becomes 
"High". In the arrangement of Fig. 11, write enable 
control section 28 consists of the 4-bit down counter 104 
only, and the load signal 1024 from 4-bit down counter 104 

20 is employed as a write control signal as it is. 

Fig. 12 shows a timing chart of the operation of 
VSR 31-0 when MSKTMS » 5 and TMS = 2 in this arrangement. 
The operation in this arrangement is the same as that of 
the arrangement of Fig. 9. in the timing chart of Fig. 12, 

25 in the first and seventh machine cycles, the load signal 
1024 from 4-bit down-counter 104 becomes "High'* and write 
enable signal 1106 becomes also "High". Thus, the 

data C is input in VSR 31-0 from the first machine cycle 
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1 to the second machine cycle and also the data A and B 

are shifted rightwards by one step, respectively; the data 
D is input in VSR 31-0 from the seventh machine cycle to 
the eighth machine cycle and also the data B and C are 
5 shifted rightwards by one step, respectively. 

According to this arrangement of VSR 31-0 , the 
same effect as in the previous arrangement shown in Fig. 7 
can be attained by less amount of hardware than the latter. 

Fig. 13 shows still another arrangement of VSR 
10 31-0. In this arrangement, VSR 31-0 consists of a write 
control section 28 for performing a shifting operation, 
an output selection control section 29 and step-number- 
variable shift register cells (vsr) 100, 

The output selection control section 29 in this 
15 arrangement consists of a 3-bit up counter, an RAM 203 

and a decoder 105. The counter output line 1003 constitutes 
an address line for RAM 203, and the contents at the address, 
specified by counter output line are fetched from the RAM 
output line 2003, fed to decoder 105, converted into an 
20 output selection signal 1015 which is supplied to vsr 100. 

There is shown in Fig. 14 the operation of 
VSR 31-0 when 0, 2 and 4 have previously stored at the 
addresses of RAM 203, and shown in Fig. 15 its timing 
chart. In Figs. 14 and 15, MSKTMS and TMS are set at 5 
25 and 2, respectively. 

The input and shifting of the data are performed 
from the first machine cycle to the second machine cycle, 
and thereafter during the second to seventh machine cycles, 
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the contents A, C and E of vsr 100, which are specified 
every clock by the RAM output line, are read out in the 
order ofE, C,A, E, C, A. Further, the input and shift- 
ing of the data are performed from the seventh machine 
to the eighth machine cycle, and thereafter the contents 
B, D and F of vsr 100 are read out in the order of F, D, 
B, F, D, B in accordance with the RAM output 2003. 

In accordance with this arrangement of VSR 31-0, 
by previously setting the data in RAM, any data stored in 
the variable step shift register can be read out in any 
order so that scattered local neighboring images can be 
efficiently processed in a time division manner. 

Fig. 16 shows an arrangement for performing a 
3x3 local neighboring image data operation every three 
machine cycles in a time division manner using one main 
module 10 shown in Fig. 1. In this arrangement, each of 
VSR's 31-0, 31-1 and 31-2 preserves 1x3 local neighbor- 
ing image data in three time division processings, and 
these VSR's are arranged in a manner of 3 x 1 by switching 
selectors 33-0 and 33-1. Thus, as a whole, 3x3 local 
neighboring image data are preserved in these VSR's. This 
arrangement is implemented in such a way that the control 
circuit 21 is externally operated so that MSKTMS and TMS 
are set at 2 and selectors 33-0 and 33-1 can select data 
lines 200 and 201, respectively. It should be noted that 
only one main module 10 is used so that the data is not 
required to be sent to image data output port 55 through 



1 selector 70. v 1 »««frO 

Input image 1 is raster-scanned once in three 

machine cycles and is fed to VSR 31-0 and line buffer 20-& 

through image data input port 54 one image data during every 

5 three machine cycles. Line buffer 20-0 delays the image 

data by the time required to scan one line of the input 

image 1. The output from line buffer 20-0 is fed to VSR 

31-1 and line buffer 20-1. Line buffer 20-1, like line *L 

«-*••• 

buffer 20-0, delays the image data by the time required to . * : 

* • • 

10 scan one line of the input image 1 and supplies the delayed _ 
image data to VSR 31-2. VSR's 31-0, 31-1 and 31-2 take in 
one image data once in three machine cycles and shift* 
them, respectively. Then, nine local neighboring image 
data A, B, C, D, E, F, G, H and I required to calculate 

15 one image data of output image 2 are preserved inside 
VSR's 31-0, 31-1, 31-2 during the three machine cycles. 

The local neighboring image data preserved in 
VSR's 31-0, 31-1 and 31-2 are read out in a time division 
manner during the three machine cycles, and fed to proces- 

20 sor elements (PE's) 37-0, 37-1 and 37-2 (Fig. 3) in 

parallel operating section 30. In PE's 37-0, 37-1 and 
37-2, arithmetics are performed between the image data 
supplied from VSR's 31-0, 31-1 and 31-2 and the coeffici- 
ent data supplied from the corresponding coefficient 

25 memories 36-0, 36-1 and 36-2. The operation results 

thus obtained are unified in arithmetic element 38. In 
this way, the operation results of the image data constitut- 
ing a local neighboring image are fetched from arithmetic 
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1 element 38 in their three parts divided, unified i? 
circuit 4 0 during three machine cycles and output from 
main module 10 as output image 2. 

Fig. 17 shows an arrangement for performing a 
5 3x3 local neighboring i mage data operation every one 
machine cycle using three main modules 10 one of which is 
shown in Fig. 1. m this arrangement, three VSR's 31 are 
arranged in a manner of 1 x 3 by switching selectors 33-0 
and 33-1. And also the image data delayed from the 
10 input image data by one line of the data by line buffer 

20-0 is output from image data output port 55 by switching 
selector 70 so that three main modules 10 are arranged in 
a manner of 3 x 1. Thus, as a whole, 3 x 3 local neighbor- 
ing image data are simultaneously fetched. This arrange- 
15 ment is implemented in such a way that the control circuit • 
21 is externally operated so that MSKTMS and TMS are set 
at 0 and selectors 33-0, 33-1 and 70 can select data lines 
300, 301 and 200, respectively. it should be noted that 
each main module 10 selects the output from line buffer 
20 20-0 by selector 70 and outputs the data on data line 
200 from image data output port 55. 

Input image 1 is raster-scanned every one machine 
cycle. The input image data read out by the raster scanning 
are supplied to the image data input port 54 of a main 
25 module IDA. The image data delayed by one line of the 

data by line buffer 20-0 in main module 10A is output from 
the image data output port 55 of main module 10A, and fed 
to the image data input port 54 of a main module 10B. 
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1 In the same manner, the image data delayed by further one 
line of the data is delivered from main module 10B to 
main module IOC. The arithmetic result output from the 
operation data output port 55 of main module 10A is applied 

5 to the arithmetic data input port 64 of main module 10B 
and is unified with the operation result of parallel 
operation section 30 by unifying circuit 40 in main module 
10B, In the same way, the operation result is delivered 
from main module 10B to main module IOC and is unified 
10 with the operation result of parallel processor 30 in 
main module 10C, and the unified data is output as an 
output image data every one machine cycle from the operation 
data output port 65. 

Inside the main modules 10A, 10B and 10C, the 

* * 

15 respective image data are input in VSR 31-0, and sequentially.* 1 
shifted to VSR's 31-1 and 31-2. Thus, 3x3 local neighbor- 
ing data A, B, C, D, E, F, G, H and I are simultaneously 
preserved in the total nine VSR's 31 in the three main 
modules 10. The arithmetics thereof are performed by the 

20 total three parallel processor sections 30 during one 
machine cycle. 

Fig. 18 shows an arrangement for performing a 
7x7 local neighboring operation every seven machine 
cycles using three main modules connected in the same way 

25 as in Fig. 17. In this arrangement, each of VSR's 31-0, 
31-2 preserves 1x7 local neighboring image data in 
seven time division processings, and these three VSR's 
are arranged in a manner of 3 x 1 by switching selectors 
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1 33-0 and 33-1. Thus, 3x7 local neighboring data are 
preserved in these VSR's for one main module. The image 
data delayed from the input image data by two lines of 
the data is output from image data output port 55 by 
5 switching the selector 70 so that these three main modules 
10 are arranged in a manner of 3 x 1. However, the size 
of the local neighboring image data to be fetched is not 
9x7 but 7x7. This is because one line of the image 
data is repeated in the adjacent main modules. The 
10 repetitions can be obviated by providing three line buffers 
20 in one main module 10. 

This arrangement is implemented i n such a way 
that the control circuit 21 is externally operated so that 
MSKTMS and TMS are set at 6 and selectors 33-0, 33-1 and 
15 70 can select data lines 200, 201 and 201, respectively. 
It should be noted that each main module 10 selects the 
output from line buffer 20-1 by selector 70 and outputs 
the data on data line 201 from image data output port 55. 

Input image 1 is raster-scanned once in seven 
20 machine cycles and is fed to the image data input port 54 
of main module one pixel during every seven machine cycles. 
The image data delayed from the input image by two lines 
of the data by line buffers 20-0 and 20-1 in main module 
10A is output from image data output port 55 thereof and 
25 fed to the image data input port 54 of main module 10B. 
In the same way, the image data delayed by further two 
lines of the data is delivered from main module 10B to 
main module 10C. The operation result output from the 
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operation data output port 6 5 of main module 10A is fed 
to the operation data input port 64 of main module 10B 
and unified with the operation result of the parallel 
processing section 30 by unifying circuit 40 inside main 
module 10B. In the same way f the operation result is 
delivered from main module 10B to main module 10C and 
unified with the operation result of the parallel proces- 
sing unit 30 in main module 10C # and the unified data is 
output from operation data output port 65 as an output image 
data every seven machine cycles. 

Inside main module 10A, 3x7 local neighboring 
image data are preserved in VSR's 31-0, 31-1 and 31-2 
thereof. Inside main modules 10B and 10C, 2x7 (but not 
3x7) local image data are preserved as effective image 
data in VSR's 31-1 and 31-2 thereof, respectively, during 
seven machine cycles since the image data preserved in 
the respective VSR's 31-0 are the same as the image data 
preserved in VSR's 31-2 of the respective previous step 
main modules. Thus, 7x7 local neighboring image data 
are preserved during the seven machine cycles in the total 
seven VSR's 31 in the three main modules 10A, 10B and 10C. 
The 7x7 local neighboring image data are read out 
during the seven machine cycles in a time division manner, 
and operated by the total three parallel processing sections 
30 every seven machine cycles. 

Incidentally, by setting MSKTMS and TMS at 4 in 
the above arrangement, the arithmetic of 5 x 5 local 
neighboring image data can be performed every five machine 



- 33 - O 189943 

1 cycles. in this case, it should be noted that the selec- 
tion of the outputs from the line buffers 20 by the 
selector 70 in each main module 10 is controlled by control 
circuit 21. 

5 Accordingly, in accordance with this embodiment 

of this invention as mentioned above, the arithmetic of 
3x3 local neighboring image data can be performed 
using one main module 10 every three machine cycles. And 
also, three kinds of arithmetic of 3 x 3, 5 x 5 and 7 x 7 
.0 local neighboring image data can be performed using three 
main modules 10, without changing the manner of connecting 
them through the operation of control circuit 21. 

Fig. 19 shows another arrangement of the main 
module of the parallel image processor according to this 
.5 'invention. In the main module shown in Fig. 19, four 
VSR's 31, four arithmetic circuits (PE) inside parallel 
processing unit 30, three selectors 33 and three line 
buffers 20 are used, i.e. one more element than in the 
main module 10 shown in Fig. 1 is used for these components. 
0 The selector 33-1 is a selector of 3- to -1 which selects 
one of three data lines 200, 201 and 301, and so the 
arrangement of VSR 31 can be changed in three manners 
of 1 x 4, 2 x 2 and 4 x 1 by switching the selector 33-1. 
The selector 70 is a selector of 4- to -1 which selects 
5 one of four data lines 540, 200, 201 and 202, and 

so by switching the selector 70, one of the image data 
delayed from the input image data by zero, one, two, and 
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1 three lines can be selected and output from image data 
output port 55. 

Fig. 20 shows an arrangement for performing a 
4x4 local neighboring image data operation every four 
5 machine cycles in a time division manner using one main 
module 10 shown in Fig. 19. In this arrangement, the 
circuits other than line buffers 20 and VSR's 31 are 
omitted for brevity's sake. And also in this arrangement, 
one VSR 31 preserves 1x4 local neighboring image data in 

10 four time division processings, and four VSR's 31 are 

arranged in a manner of 4 x 1 by switching selectors 33. 
Thus, as a whole, 4x4 local neighboring image data are 
preserved in these VSR's. This arrangement is implemented 
in such a way that the control circuit 21 is externally 

15 operated so that MSKTMS and TMS are set at 2 and selectors * 
33-0, 33-1 and 33-2 can select data lines 200 , 201 and 202/ 
respectively. 

Fig. 21 shows an arrangement for performing a 
4x4 local neighboring image data operation every one 

20 machine cycle using four main modules one of which is shown 
in Fig. 19. In this arrangement, four VSR's 31 are arranged 
in a manner of 1 x 4 by switching selectors 33 shown 
Fig. 19. And also in each module the image data delyaed 
from the input image data by one line of the data is 

25 output from the image data output port by switching the 
selector 70 so that four modules 10 are arranged in a 
manner of 4 x 1. Thus, as a whole, 4x4 local neighboring 
image data can be simultaneously fetched. This arrangement 



ISOOCID: <EP 0189943A2_I_> 



* 35 " C1 8.9943 

1 is implemented in such a way that the control circuit 21 
is externally operated so that MSKTMS and TMS are set at ' 
0 and selectors 33-0, 33-1, 33-2 and 70 can select data 
lines 300, 301, 302 and 200, respectively. it should be 
5 noted that selector 70 in Fig. 19 serves to select line 
buffer 20-0 and output the data on data line 200 from 
image data output port 55. 

In the arrangement of Fig. 21, an input image 
data is applied to image data input port 54 of main module 
10 10A. The image data delayed from the input image data by 
one line of the data is output from image data output port 
55 of main module 10A and applied to image data input port ' 
54 of main module 10B. In the same way, the image data 
is delivered from main module 10B to main module 10C and 
15 from main module 10C to main module 10D. Moreover, the 

operation result output from operation data output port 65 
of main module 10A is applied to the operation data input 
port 64 of main module 10B. In the same way, the operation 
result is delivered from main module 10B to main mo dule 
20 10C and from main module 10C to main module 10D. Finally, 
an output image data is output from the operation data 
output port 65 of main module 10D every one machine cycle. 

Fig. 22 shows an arrangement for performing an 
8x8 local neighboring image data operation every four 
25 machine cycles using four main modules 10 connected in the 
same manner as in Fig. 21. in this arrangement, each VSR 
preserves 1x4 local neighboring data in four time 
division processings, and these four VSR's 31 are arranged 
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1 in a manner of 2 x 2 by switching selectors 33 in Fig. 19. 
Thus, 2x8 local neighboring data are preserved for one 
main module. The image data delayed from the input image 
data by two lines of the data is output from image data 
5 output port 55 by switching the selector 70 so that these 
four main modules 10 are arranged in a manner of 4 x 1. 
As a whole, 8x8 local neighboring image data are preserved 
in this arrangement. This arrangement is implemented in 
such a way that the control circuit 21 is externally operated 
10 so that MSKTMS and TMS are set at 3 and selectors 33-0, 

33-1 , 33-2 and 70 can select data lines 300, 200 , 302 and 
201, respectively. It should be noted that the main 
module 10 in Fig. 19 selects the output from line buffer 
20-1 by means of selector 70 and outputs the data on data 
15 line 201 through image data output port 55. 

In the arrangement of Fig. 22, input image 1 is 
raster- scanned once in four machine cycles and is fed to 
the image data input port 54 of main module 10A one 
pixel during every four machine cycles. The image data 
20 delayed from the input image by two lines of the data by 
line buffers 20-0 and 20-1 in main module 10A is output 
from image data output port 55 thereof and fed to the 
image data input port 54 of main module 10B* In the same 
way, the image data delayed by further two lines of the 
25 data is delivered from main module 10B to main module 10C, 
moreover from main module 10C to main module 10D. The 
operation result is delivered from main module 10D as an 
output image data every four machine cycles. 



SDOCtD: <EP 0189943A2J_> 



- 37 - 01 33943 

1 In accordance with this embodiment of this 

invention as mentioned above, the arithmetic of 4 x 4 local 
neighboring image data can be performed using one main 
module 10 every four machine cycles. And also, sveral 
5 kinds of arithmetic of local neighboring image data x 4 x 4, 
8x8, etc. can be performed using plural main modules 10, 
without changing the manner of connecting them through the 
external operation of control circuit 21. 

Fig. 23 shows still another arrangement of the 
10 main module 10 of the parallel image processor, which 

includes three line buffers 20, nine VSR's and also nine 
processor elements (PE) 37 in parallel processor section 30. 

Fig. 24 shows an arrangement for performing a 3x3 
local neighboring image data operation every one machine 
15 cycle using one main module 10. Fig. 25 shows an arrange- 
ment for performing a 3 x 9 local neighboring image data 
operation every three machine cycles in a time division 
manner by means of the same hardware as in Fig. 24. 

Fig. 26 shows an arrangement for performing a 
20 9 x 9 local neighboring image data operation every one 
machine cycle using nine main modules 10. 

An image data f is applied to the image data 
input 54 of main module 10A. It is also delayed by three 
pixels by a shift register 3 and applied to the image data 
25 input port 54 of main module 10B. The. delayed image data 

is further delayed by three pixels by a shift register 4 and 
applied to the image data input port 54 of main module 10C. 
The image data delayed from the input image data f by three 
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1 lines of the data, output from the respective image data 
output ports 55 of main modules 10A, 10B and IOC, are 
applied to the image data input port 54 of main modules 
10D, 10E and 10F, respectively. The image data delayed 
5 from the input image data f by six lines of the data, 
output from the respective image data output ports 55 of 
main modules 10D, 10E and 10F, are applied to the image 
data input port 54 of main modules 10G, 10H and 101, 
respectively • Further, the operation result output from 
10 the operation data output port 65 of main module 10A is 
applied to the operation data input port 64 of main 
module 10D. In the same way, the operation result is 
delivered from main module 10D to main module 10G, from 
10G to 10B, and further to 10E, 10H, 10C, 10F and 101. 
15 Finally, an output image data g is output from the opera- 
tion data output port 65 of main module 101 every one 
machine cycle. 

Fig. 27 shows an arrangement for performing a 
9x9 local neighboring image data operation every three 
20 machine cycles in a time division manner using three main 
modules 10. In this arrangement, the same 9x9 local 
neighboring image data operation as in the arrangement of 
Fig. 26 can be realized by means of the amount of hardware 
which is 1/3 of that of the latter, 
25 In accordance with this embodiment of this 

invention, the arithmetic of 3 x 3 local neighboring 
image data can be performed using one main module 10 
every one machine cylce. And also, by using plural main 
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1 modules 10, the arithmetic employing a larger local image 
region, e.g. zero-crossing operation, pattern matching, 
etc. can be performed every one machine cycle. Further, 
the arithmetic employing a larger local image region can be 

5 performed by a smaller amount of hardware in a time division 
processing. 

Thus, several embodiments of this invention have 
been explained above. It should be noted in each embodiment 
that the respective numbers of line buffers 20, VSR's 31, 

10 and processor elements (PE) 37 in parallel processor section 
30 can be determined as required in relation to the degree 
of integration of LSI. If with m or m-1 line buffers 
and ni arithmetic circuits being provided in the main 
module, such a single main module is used for the time 

15 division processing in n cycles, the processing of m x n 
local neighboring image data can be performed in m machine 
cycle. Or if above-mentioned n main modules are arranged 
for the parallel processing of the respective line buffer 
outputs selected by selector 70 one for each main module, 

20 the processing of n x m local neighboring image data can 
be performed in one machine cycle . 

Further, only if selectors 70 and 31 are switched 
with the n main modules provided, the time division proces- 
sing of (m x n) rows x t columns can be performed at the 

25 maximum (In this case, t machine cycles and an arrangement 
of VSR's of t steps are required). 

A wider varieties of parallel processings can be 
performed at a high speed by providing m x n arithmetic 
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1 circuits 37. 

Thus, the parallel image processor according to 
this invention can be flexibly adapted to the conflicted 
needs of users that a large amount of image data is desired 

5 to be processed at a high speed, or by a small amount of 
hardware although more time may be taken. 

(1) In accordance with this invention, the local 
neighboring image region to be subjected to a local 
neighboring image data processing can be easily expanded 

10 without the needs of externally equipped circuits and 
complicated controls. 

(2) In accordance with this invention, the local 
neighboring image operations for various local neighboring 
image regions can be realized by altering the construction 

15 of each of main modules through the operation of a control 
circuit provided therein and without altering the connecting 
manner of the main modules. 

(3) In accordance with this invention, the amount of 
hardware used can be greatly reduced by LSI 1 ing each main 

20 module. 
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CLAIMS : 

1 • A parallel image processor consisting of at 

least one main module (10) for performing a parallel 
operation of local neighboring image data on the basis 
of input image data externally taken in, comprising; 

at least one data memory (20-i) for delaying 
the input image data by one line of the data in order; 
and 

an output port (55) for fetching the delayed 
image data and feeding it to the other main module 
as an input image data for connection between the main 
modules . 

2 * A parallel image processor consisting of at 

least one main module for performing a parallel opera- 
tion of local neighboring image data on the basis of 
externally input image data, comprising: 

at least one data memory (20-i) for 
delaying the input image data by one line of the data 
in order; 

a selector (70) for selectively switching 
the externally input image data and the delayed image 
data; and 

an output port (55) for fetching the selected 
image data and feeding it to the other main module as 
an input image data for connection between the main 
modules. 

3 * A parallel image processor as claimed in Claim 

1 or 2, wherein said main module comprises: 
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at least one sequential memory means (31-i) 
for storing local neighboring image data sequentially 
cut out from the input image data; 

a parallel operation unit (30) for perfor- 
ming a parallel operation of the local image data f 
and 

unifying means (40) for unifying the results 
of the parallel operation and outputting the unified 
result. 

4. A parallel image processor as claimed in 
Claim 3, wherein said parallel operation unit (30) 
consists of a plurality of processor elements (37-i) 
and a plurality of coefficient memories for storing 
coefficient data corresponding to the processor 
elements, said sequential memory means (31-i) 
consists of a plurality of memory elements corresponding 
to the processor elements, and the local image data 

cut out from the sequential memory means and the 
coefficient data are operated in parallel in the 
corresponding processor elements* 

5. A parallel image processor as claimed in 
Claim 3, wherein said sequential memory means is 
constructed by shift registers. 

6. A parallel image processor as claimed in 
Claim 1 or 2, wherein said data memory is constructed 
by RAM's or by shift registers. 
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7 • A parallel image processor consisting of 

at least one main module (10) for performing the 
parallel operation of m x n (m, n : integer) local 
neighboring image data cut out from an externally 
input image data, comprising: 

at least (m - 1 ) line buffer (20-i) for 
delaying said input image data in order by one line 
of the data; 

sequential memory means (30-i) consisting 
of m x n steps, for storing the local neighboring 
image data sequentially cut out from the input image 
data or the delay image data; 

a parallel operation section (30) including 
m processor elements for performing the parallel 
operation of the local neighboring image data; and 

a unifying circuit (40) for unifying the 
results of the parallel operation in n machine cycles 
and outputting the unified result. 

Q 

• A parallel image processor as claimed in 

Claim 7, wherein said parallel operation section (30) 
comprises m coefficient memories (36-i) for coeffi- 
cient data corresponding to the processor elements 
i 37) , respectively. 

9. A parallel image processor consisting of at 

least one main module (10) for performing a parallel 
operation of local neighboring image data cut out 
from an externally input image data, comprising: 
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an input port (54) for externally taking in 
an input image data; 

at least {ia - 1) line buffers for delaying the 
input image data in order by one line of the data; 

m sequential memory means having a variable 
number of steps , for storing the local neighboring 
image data sequentially cut out from the input image 
data or the delayed image data; 

first (m - 1 ) selectors (33-i) for selec- 
tively switching the outputs from the line buffers 
and the outputs from the sequential memory means 
and supplying them to a succeeding sequential memory 
- means; 

a parallel operation section (30) comprising 
m processor elements (37) for performing the parallel 
operation of the local neighboring image data output 
from the corresponding sequential memory means; 

unifying means (40) for unifying the results 
of the parallel operation and outputting the unified 
result; 

a second selector (70) for selectively 
switching the externally input image data and the 
image data delayed by the line buffers; 

an output port (55) for fetching the 
image data selected by the second selector? and 

a control circuit (21) for supplying control 
signals to said first and second selectors. 
10 m A parallel image processor as claimed in 
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Claim 9 , wherein said parallel operation section (30) 
comprises m coefficient memories for coefficient data 
corresponding to the processor elements (37), 
respectively, and the local neighboring image data cut 
out from the sequential memory means and the coefficient 
data are operated in parallel in the corresponding 
processor elements. 

11 - A parallel image processor as claimed in 

Claim 9 , wherein said sequential memory means inter- 
mittently performs a shift operation for clock signals 
and reads out the memory content each clock signal. 
12 • A parallel image processor as claimed in 

Claim 9 , wherein said line buffers comprise infor- 
mation memory sections (241, 242) permitting at least 
one bit to be simultaneously read out and written in, 
and a row address control section (245) for controlling 
the row addresses of said information memory section, 
the read-out, write-in starting and ending row addresses 
of said information memory section are determined in 
accordance with the control signals supplied to the row 
address control section so as to make variable the 
number of delay steps . 

13 • A parallel image processor as claimed in 

Claim 9 , further comprising an operation data input 
port (64) for taking in an operation result externally 
provided which is unified with the operation result 
obtained from the parallel operation section in said 
unifying means (40) , and an operation data output 



port (65) for externally outputting the unified result. 

14. A parallel image processor as claimed in 
Claim 9 , wherein each of said sequential memory means 
in said main module is constructed by n steps and 
said first selectors are switched to the outputs from 
the line buffers , so that m x n local neighboring image 
data are processed in a time division manner in n machine 
cycles . 

15. A parallel image processor as claimed in 
Claim 9 , wherein consisting of n main modules n x m 
local neighboring image data are processed during one 
machine cycle in such a state that the image data 
output ports (55) are connected with the input ports 
of succeeding main modules to provide the sequential 
memory means each having one step, the first selectors 
(33-i) are switched to the outputs from the sequential 
memory means, and the second selector (70) is switched 
to the output from the line buffer delayed by one line 
of the data. 

16. A parallel image processor as claimed in 
Claim 9 , an arrangement wherein consisting of n main 
modules maximum (m x n) x t local image neighboring 
image data are processed in a time division manner in 
t machine cycles in such a state that the image data 
output ports are connected with the input ports of 
secceeding main modules to provide the sequential 
memory means each having t steps, the first selectors 
(33-i) are switched to the outputs from the line 
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buffer, and the second selector (70) is switched 
either one of the outputs from the line buffers. 
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